MfCROCOPY  RESOLUTION  TEST  CHART 

NATrONAl  BUREAU  OF  STANDARDS- 1963*A 


AFHRL-TP-85-51 


JOB  PERFORMANCE  MEASUREMENT  CLASSIFICATION  SCHEME 
FOR  VALIDATION  RESEARCH  IN  THE  MILITARY 


AIR  FORCE  m 


Michael  J.  Kavanagh 
School  of  Bttsliwss 

Stoto  Unlvorslty  of  Now  York  «t  Albany 
Albany.  Now  York  12222 

Walter  C.  Borman 

Personnol  Dodslons  Research  Institute 
43  Main  Street,  S.E. 

Suite  405  ^ 

Ninnoepolls,  Minnesota  55414  ^ 

Jerry  W.  Hedge  V 
R.  Bruce  Gould 


MANPOWER  AND  PERSONNEL  DIVISION 
Brooks  Air  Force  Base,  Texas  7B235-5601 


February  19B6 

Intorla  Paper  for  Period  Nay  1962  -  Noveuber  1982 


i 


O 

O  ■ 


Approved  for  public  release;  distribution  unlimited. 


LABORATORY 


AIR  FORCE  SYSTEMS  COMMAND 
BROOKS  AIR  FORCE  BASE,  TEXAS  78235-5601 


MOTICE 


When  Govemaent  drawings,  specifications,  or  other  data  are  used  for  any 
purpose  other  than  In  connection  with  a  definitely  Governaent-rel ated 
procureaent,  the  United  States  Governaent  Incurs  no  responsibility  or  any 
obligation  whatsoever.  The  fact  that  the  Governaent  aay  have  foraulated  or 
In  any  way  supplied  the  said  drawings,  specifications,  or  other  data.  Is 
not  to  be  regarded  by  lapllcatlon,  or  otherwise  In  any  aanner  construed,  as 
licensing  the  holder,  or  any  other  person  or  corporation;  or  as  conveying 
any  rights  or  peralsslon  to  aanufacture,  use,  or  sell  any  patented 
Invention  that  aay  In  any  way  be  related  thereto. 

The  Public  Affairs  Office  has  reviewed  this  paper,  and  It  Is  releasable  to 
the  National  Technical  Inforaatlon  Service,  where  It  will  be  available  to 
the  general  public.  Including  foreign  nationals. 

This  paper  has  been  reviewed  and  Is  approved  for  ptd>11  cation. 


NANCY  GUINN  VITOLA,  Technical  Director 
Manpower  and  Personnel  Division 


RONALD  L.  KERCHNER,  Colonel,  USAF 
Chief,  Manpower  and  Personnel  Division 


1  REPORT  DOCUMENTATION  PAGE 

la.  REPORT  SECURITY  CLASSIFICATION 

Unci assifled 

lb.  RESTRICTIVE  MARKINGS 

2a  SECURITY  CLASSIFICATION  AUTHORITY 

3  DISTRIBUTION /AVAILABILITY  OF  REPORT 

2b.'  DECLASSIFICATION  /  DOWNGRADING  SCHEDULE 

Approved  for  public  release;  distribution  un11m1''^d. 

4.  PERFORMING  ORGANIZATION  REPORT  NUIVlBER(S) 

5  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 

AFHlL-TR-85-51 

6a.  NAME  OF  PERFORMING  ORGANIZATION 

McFann  -  Gray  8  Associates,  Inc. 

6b  OFFICE  SYMBOL 
(If  applicable) 

7a  NAME  OF  MONITORING  ORGANIZATION 

Manpower  and  Personnel  Division 

6c  ADDRESS  {City,  State,  and  ZIP  Code) 

2100  Garden  Road,  Suite  J 

Monterey,  California  93940 

7b  ADDRESS  (C/ty,  State,  and  ZIP  Code) 

Air  Force  Human  Resources  Laboratory 

Brooks  Air  Force  Base,  Texas  78235-5601 

8a.  NAME  OF  FUNDING /SPONSORING 
ORGANIZATION 

Air  Force  Human  Resources  Laboratory 

8b  OFFICE  SYMBOL 
(If  applicable) 

HQ  AFHIL 

9  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

F41 689-81 -C-0022 

1  8c.  ADDRESS  (City.  State,  and  ZIP  Code) 

10  SOURCE  OF  FUNDING  NUMBERS 

Brooks  Air  Force  Base,  Texas  78235-5601 


PROGRAM 


62703F 


PROJECT 

TASK 

NO 

NO 

7719 

18 

1 1 .  TITLE  (Include  Security  Clesiification) 

Job  Performance  Measurement  Classification  Scheme  for  Validation  Research  In  the  Military 


12.  PERSONAL  AUTHOR(S) 

Kavanagh,  Michael  J.;  Borman,  Walter  C. :  Hedge,  Jerry  W.;  Gould,  R.  Bruce 


13b.  TIME  COVERED 

14.  DATE  OF  REPORT  {Year.  Month,  Day) 

FROM  Maw  fl?  TO 

■■TrSvV!f4 

February  1986 

COSATI  COOES 


GROUP  SUB-GROUP 


09 


10 


18  SUBJECT  TERMS  {Continue  on  reverse  if  necessary  and  identify  by  block  number) 
criterion  development 
job  performance  assessment 
performance  measurement 


19  ABSTRACT  {Continue  on  reverse  if  necessary  and  identify  by  block  number) 


This  paper  outlines  the  development  of  a  performance  measurement  classification  scheme.  Its  major  focus  Is 
on  the  quality  of  measurement  when  the  purpose  Is  for  validation  research  In  the  military.  It  describes  the 
background  and  purpose  for  development  of  the  classification  scheme  and  explains  the  Information  and  procedures 
used  with  this  approach.  In  order  to  develop  a  measurement  methodology  for  job  performance,  a  classification 
scheme  of  job  performance  measurement  quality  was  needed  (a)  to  summarize  and  organize  research  progress  In 
terms  of  previous  empirical  work  and  (b)  to  Identify  future  research  needs.  A  more  detailed  presentation  of 
this  conceptually  based  model  will  be  available  In  a  forthcoming  report,  which  w111  Include  a  literature  review 
and  specific  directions  for  future  job  performance  measurement  research. 


20  DISTRIBUTION /AVAILABILITY  OF  ABSTRACT 
[SUNCLASSIFIED/UNLIMITEO  □  SAME  AS  RPT 


22a.  NAME  OF  RESPONSIBLE  INDIVIDUAL 

Nancy  A.  Perrigo.  Chief,  STINFO  Office 


21  ABSTRACT  SECURITY  CLASSIFICATION 
□  dtic  USERS  Unclassified 


22b  TELEPHONE  f/ndude  Area  Code)  22c  OFFICE  SYMBOL 
(512)  536-3877  AFHRL/TSR 


DO  FORM  1473, 84  MAR 


83  APR  edition  may  be  used  until  exhausted 
All  other  editions  are  obsolete 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 
Unclassified 


AFHRL  Technical  Paper  85-51 


February  1986 


JOB  PERFORMANCE  MEASUREMENT  CLASSIFICATION  SCHEME 
FOR  VALIDATION  RESEARCH  IN  THE  MILITARY 


Michael  J.  Kavanagh 
School  of  Business 

State  University  of  New  York  at  Albarvy 
Albany,  New  York  12222 


Halter  C.  Bor«an 


Personnel  Decisions  Research  Institute 
43  Main  Street,  S.E. 

Suite  405 

Minneapolis,  Minnesota  55414 


Jerry  H.  Hedge 
R.  Bruce  Gould 

MANPOWER  AND  PERSONNEL  DIVISION 
Brooks  Air  Force  Base,  Texas  78235-5601 


Reviewed  by 

Rodger  Ballentine,  Lt  Colonel,  USAF 
Chief,  Productivity  and  Perfonunce  Neasureaent  Function 


Subaltted  for  publication  by 
R.  Bruce  Gould 

Chief,  Force  Utilization  Branch 
Manpower  and  Personnel  Division 


This  report  Is  prlaarlly  a  working  paper.  It  Is  published  solely  to  docuaent  work  perforaed. 


SINMRY 


This  paper  provides  an  overview  of  the  background  and  development  of  a  classification  scheme 
of  job  performance  measurement  quality  to  be  used  for  "research  purposes  only."  It  describes,  in 
brief  form,  the  development  of  a  conceptual  framework  to  be  used  in  the  planning  and  conduct  of  a 
long-term  research  and  development  effort  to  obtain  job  performance  criterion  measures  in  the 
military. 


PREFACE 


This  paper  describes  the  Initiation  of  a  long-tem  program  of  research  and 
development  focusing  on  Job  performance  criterion  development.  As  such.  It  represents 
the  Air  Force's  Initial  contribution  to  a  Joint-Service  effort  to  test  the  feasibility 
of  linking  enlistment  standards  and  Job  performance.  A  much  more  detailed  presentation 
of  the  Air  Force  approach  to  this  research  effort  will  be  Included  In  a  forthcoming 
Technical  Report. 

The  work  was  performed  by  McFann-Gray  and  Associates,  Inc.,  under  contract 
F41 689-81 -C-0022  from  the  Air  Force  Human  Resources  Laboratory  (AFHRL)  Manpower  and 
Personnel  Division,  The  work  unit  number  for  AFHRL  was  77191821.  Dr.  R.  Bruce  Gould 
was  the  AFHRL  Contract  Monitor. 
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JOB  PERFORMANCE  MEASUREMENT  CLASSIFICATION  SCHEME 
FOR  VALIDATION  RESEARCH  IN  THE  MILITARY 


I.  INTRODUCTION 

Over  the  past  2  years,  reviews  of  the  research  and  development  (R4D)  thrusts  of  the  Manpower 
and  Personnel  Division  of  the  Air  Force  Human  Resources  Laboratory  (AFHRL)  have  strongly 
recommended  that  a  major  criterion  development  effort  be  initiated.  Many  individual  R4D  projects 
were  involved  in  the  development  of  job  performance  criteria  to  validate  selection, 
classification,  and  training  systems.  This  was  thought  to  be  inefficient,  and  a  unified  effort 
to  develop  a  strategy  for  job  performance  criterion  development  was  recommended  for  all  future 
RAD  projects. 

A  second  driving  force  behind  the  development  of  a  criterion  measurement  methodology  was  the 
recognition  that  the  Armed  Forces  Vocational  Aptitude  Battery  (ASVAB)  must  be  validated  against 
on-the-job  performance.  Historically,  the  ASVAB  (and  its  predecessors)  has  been  validated 
predominantly  against  training  school  success.  The  need  to  establish  ASVAB  relevance  to  job 
performance  Increased  when  Congress  began  asking  why  the  ASVAB  was  not  validated  against  this 
criterion. 

In  their  1982  recommendation  regarding  the  research  thrusts  in  the  Manpower  and  Personnel 
Division,  a  Research  Advisory  Panel  composed  of  scientist  representatives  from  the  Department  of 
Defense  (DOD),  academia,  and  industry  noted  that  it  was  not  necessary  for  the  Laboratory  to 
"start  from  scratch"  in  terms  of  this  criterion  development  effort.  A  number  of  measurement 
characteristics  have  been  well-established  in  the  empirical  literature  regarding  criterion 
development.  The  panel  recommended  close  examination  of  this  literature  as  a  basis  for 
development  of  a  structural  model  of  job  performance  measurement  quality  to  guide  a  research 
program  to  develop  a  criterion  measurement  methodology  for  the  military.  This  paper  describes 
the  first  step  in  the  model -bull ding  process,  namely  a  conceptual  approach  to  the  organization  of 
the  literature  and  the  generation  of  research  initiatives. 

Implementation  of  these  recommendations  required  extensive  resources  to  completely  research 
and  develop  a  performance  measurement  system  for  use  in  validation  research.  After  about  9 
months  of  research  planning,  it  appeared  highly  probable  that  resources  necessary  for  this 
project  were  going  to  be  made  available  starting  in  fiscal  year  1983  for  a  long-term  effort. 
Availability  of  these  resources  was  expected  to  assure  considerable  reduction  in  magnitude  of 
“the  criterion  problem,"  at  least  in  terms  of  validation  research  in  the  military. 

There  are  two  reasons  why  the  "criterion  problem"  has  persisted.  First,  it  has  been  a  matter 
of  convenience  for  some  researchers,  who  use  it  to  indirectly  justify  their  use  of  easily 
accessible  and  often-used  measures  of  job  performance  when  they  conduct  validation  research. 
Second,  and  more  importantly,  resolving  the  criterion  problem  involves  extensive  resources;  in 
the  past,  no  one  has  been  willing  to  commit  that  needed  level  of  resources  to  conduct  the 
research.  Thus,  this  planned  AFHRL  criterion  research  effort  will  serve  two  purposes;  (a)  It 
will  develop  a  methodology  for  job  performance  measurement  in  the  military,  and  (b)  it  will 
contribute  to  scientific  research  on  performance  measurement  in  the  broader  scientific  community. 


II.  CLASSIFICATION  SCHEME 

The  overall  purpose  of  this  program  of  RAD  in  the  AFHRL  is  the  development  of  a  measurement 
methodology  for  job  performance.  To  accomplish  this  purpose,  it  is  necessary  to  develop  a 
classification  scheme  for  variables  related  to  job  performance  measurement  quality  which  (a) 
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sunnarlzes  and  organizes  previous  empirical  research  and  (b)  Identifies  future  research  needs. 
The  conceptual  framework  Is  derived  from  the  theoretical  and  empirical  literature  In  the  field  of 
psychology.  Figure  1  Is  a  graphic  depiction  of  the  variables  In  this  conceptual  framework.  Note 
that  this  Is  a  general  performance  measurement  classification  scheme.  A  performance  measurement 
schemata  for  the  purpose  of  validation  research  only  will  be  discussed  later.  It  Is  necessary  to 
briefly  cover  the  more  general  case  In  order  to  understand  the  model -bull ding  process. 

The  general  classification  scheme  was  developed  using  a  combination  of  several  approaches. 
By  making  performance  measurement  quality  the  outcome  variable  In  Figure  1,  the  research  focus 
centered  on  those  variables  that  have,  or  can  have,  an  Impact  on  the  quality  of  the  measure.  In 
evaluating  the  literature,  these  variables  were  Identified  and  placed  In  the  schemata.  The  Input 
variables  In  the  first  box  on  the  left  side  of  Figure  1  are  those  characteristics  of  the 
Individual  or  measurement  system  that  can  affect  the  quality  of  the  measurement.  The  process 
variables  In  the  center  of  the  figure  reflect  the  current  thinking  In  the  performance  measurement 
literature  that  these  variables  play  an  Important  and  pervasive  role  In  the  appraisal  process. 
There  has  been  a  recent  emphasis  on  cognitive  variables  (variables  of  Importance  In  the 
decision-making  process)  (Feldman,  1981;  Landy  i  Farr,  1980),  as  well  as  the 

acceptability /confidence  users  have  In  the  system  (Oipboye  i  dePontbrIand,  1981;  Kavanagh  S 
Hedge,  1983;  Landy,  Barnes,  i  Murphy,  1978)  and  their  hypothesized  effects  on  measurement 
quality.  In  addition,  the  motivation  the  raters  and  ratees  bring  to  the  appraisal  process 
(DeCotlls  t  Petit,  1978)  and  their  trust  In  the  appraisal  system  (Bernardin,  Orban,  t  Carlyle, 
1981)  are  considered  Important  process  variables.  Although  not  much  empirical  evidence  has  been 
reported  In  the  literature  with  respect  to  the  role  of  these  variables,  the  Indications  are  that 
they  act  as  Intervening  process  variables.  Thus,  while  the  Individual /system  characteristics 
have  been  recognized  as  Influencing  measurement  quality,  these  Intervening  process  variables  are 
hypothesized  to  be  functionally  related  to  both  the  Independent  and  dependent  variables; 

therefore,  for  purposes  of  classification,  these  variables  have  been  separated  from  the  other 
variables. 

Note  that  the  cognitive  variables  have  been  placed  outside  the  main  causal  path.  This 
Indicates  that  these  variables  may  not  always  play  an  Important  role  In  the  appraisal  process. 
When  the  measurement  system  relies  heavily  on  human  judgment,  such  as  with  ratings  or  trained 
observers,  these  variables  would  be  expected  to  Influence  measurement  quality.  When  human 
Judgment  Is  not  as  important,  such  as  with  productivity  counts  or  absences,  the  Impact  of  these 
cognitive  variables  on  the  dependent  variable  should  be  greatly  reduced. 

Note  further  In  Figure  1  that  the  cognitive  variables  have  been  divided  Into  two  categories. 
The  first  category  involves  input  and  storage  of  Information,  which  Is  prlnwrlly  concerned  with 
the  observational  heuristics  that  people  use  when  gathering  Information  about  an  Individual's  Job 
performance.  The  other  category  Involves  Judgment  or  decision  heuristics  that  people  use  In 
assigning  a  quantitative  Index  to  the  performance  of  a  person  on  the  Job. 

A  brief  example  will  Illustrate  the  hypothesized  relationships  between  these  variables  In  the 
conceptual  framework.  A  general  hypothesis  underlying  this  conceptualization  Is  that  the  more 
complex  (I.e.,  sophisticated,  yet  not  necessarily  cumbersome)  the  observational  and/or  decision 
heuristics  used,  the  higher  the  quality  of  the  performance  measurement.  However, 
Individual/system  characteristics  could  affect  the  complexity  of  these  cognitive  processes  and 
thus  lower  or  raise  the  quality  of  the  measures.  The  best  example  for  the  military  Is  In  terms 
of  the  Impact  of  organizational  or  unit  norms  In  the  current  operational  performance  measurement 
systems.  A  strong  norm  exists  to  give  enlisted  military  personnel  high  ratings  (i.e.,  an  “8"  or 
a  "9")  on  their  performance  evaluations.  Regardless  of  the  cause  of  this  norm.  Its  effect  1s  to 
simplify  the  rater's  cognitive  processes,  at  least  In  terms  of  completing  these  annual 
evaluations.  A  rater.  In  this  situation,  may  or  may  not  use  complex  observational  heuristics; 
however,  the  rater's  decision  heuristic  Is  simple--''8''  or  "9." 


Returning  to  Figure  1,  two  other  approaches  or  perspectives  were  used  to  generate  the 
classification  scheme.  The  first  of  these  perspectives  was  borrowed  from  test  score  theory.  The 
model  chosen  was  Spearman's  classic  test  score  model  because  of  its  simplicity  and  wide 
dissemination.  The  notion  that  an  observed  score,  a  performance  measurement  score,  could  be 
divided  into  true  and  error  components  suggested  an  approach  to  examine  the  impact  of  the 
variables  that  were  identified  on  the  quality  of  the  measure.  This  approach  also  allowed  an 
examination  of  the  literature  in  terms  of  those  variables  that  affect  true  variance  and  those 
that  affect  only  error  variance  in  the  performance  measurement  situation.  In  approaching  the 
literature  in  this  way,  it  became  obvious  that  one  would  want  to  minimize  those  factors  that 
affect  only  error  variance,  while  increasing  the  impact  of  those  factors  that  influence  primarily 
true  variance.  If  the  literature  is  examined  in  this  manner,  the  resulting  integration  has  clear 
implications  for  future  research  strategies. 

The  second  perspective  used  to  refine  the  classification  scheme  was  to  organize  into 
categories  the  many  variables  that  affect  measurement  quality.  This  allowed  an  identification  of 
the  linkages  in  the  system,  for  use  in  classifying  the  empirical  literature  in  an  organized 
fashion.  These  linkages  also  provided  a  framework  for  classification  of  needed  research  in  the 
AFHRL  program.  This  allowed  a  careful  examination  and  prioritization  of  the  total  research 
domain.  The  results  of  this  classification  of  variables  are  contained  in  Table  1.  This  is  a 
fairly  exhaustive  list  of  the  variables  that  have  been  empirically  demonstrated  to  affect 
performance  measurement  quality. 

Returning  to  Figure  1,  perhaps  the  most  critical  input  variable,  in  terms  of  its  impact  on 
rating  quality,  is  the  measurement  purpose.  As  Zedeck  and  Cascio  (1982)  have  noted,  if  a 
measurement  system  is  to  be  used  for  promotion  or  pay  increases,  it  creates  an  entirely  different 
situation  in  terms  of  the  quality  of  the  measurement  than  if  the  system  is  used  for  validation 
research  purposes.  In  fact,  if  the  conceptual  framework  in  Figure  1  is  constrained  in  terms  of 
the  purpose  for  which  it  is  being  developed  and  used,  it  changes  the  strength  or  importance  of 
the  individual /system  characteristics  with  respect  to  their  impact  on  measurement  quality.  For 
example,  the  pay-performance  relationship  with  measurement  quality  is  extremely  important  in  a 
measurement  system  being  used  for  administrative  purposes,  but  has  little  effect  if  the  system  is 
used  only  for  validation  research. 

According  to  kavanagh  (1982),  there  are  four  major  purposes  for  performance  measurement 
systems:  (a)  administrative  decisions,  (b)  employee  growth  and  development,  (c)  validation 
research,  and  (d)  requirement  to  meet  legal  guidelines.  The  strength  of  the  relationships 
between  the  individual /system  characteristics  and  measurement  quality  changes  as  a  function  of 
changing  the  purpose.  A  good  analogy  is  to  consider  the  system  characteristics  as  independent 
variables,  and  measurement  quality  as  the  dependent  variable  in  a  multiple  regression  equation. 
The  intervening  process  variable  could  also  be  included,  and  thus  the  equation  would  have 
moderator  terms.  O-’e  would  expect  the  beta  weights  for  the  system  characteristics  and  the 
process  variables  to  change  as  the  measurement  purpose  changes.  Since  the  initial  thrust  of  the 
Air  Force  effort  is  for  research  validation  only,  it  is  necessary  to  impose  this  measurement  use 
restriction,  and  the  resulting  changes  to  the  general  classification  scheme  in  Figure  1. 
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Table  1 .  Variables  That  Can  lapact  on  Job  Perforaance  Measurement  Quality 


1.  Individual  characteristics 

a.  Cognitive  variables:  Rater  or  ratee 

b.  Rater /ratee  intelligence 

c.  Rater/ratee  knowledge  of  the  job  being  evaluated 

d.  Rater/ratee  personal  characteristics 

e.  Rater/ratee  interpersonal  trust 

2.  Relationship  between  ratee  and  rater/observer 

a.  Sex  congruence 

b.  Race  congruence 

c.  Job  tenure  together 

d.  Age  congruence 

e.  Off-the-job  relationship 

f.  History  of  conflict  or  cooperation 

3.  Method/Source  of  measurement 

a.  Supervisor  ratings 

b.  Peer  ratings 

c.  Self-ratings 

d.  Subordinate  ratings 

e.  Assessment  center  (team)  ratings 

f.  Work  saraples/simulations 

g.  Productivity  records 

4.  Scale  development 

a.  Critical  incidents  used 

b.  Based  on  job  description/job  requirements 

c.  Employee  participation 

d.  Top  management  support  during  development 

5.  Rating  scale  characteristics 

a.  Content  of  the  scale 

b.  Anchors  versus  no  anchors 

c.  Behaviors  versus  traits 

d.  Format  type 

e.  Number  of  anchors/scale  points 

f.  Single  versus  multiple  dimensions 

g.  Seal ing  metric/approach 

6.  Performance  standards/goals 

a.  Present  or  not 

b.  Standards  versus  goals 

c.  Participatively  set  and  cocimunicated 

d.  Specificity  of  behavior  or  accomplishment  expected 

7.  Social  context 

a.  Performance  level  of  others  in  work  group 

b.  Existence  of  group  norms 

c.  Rater's  status  in  group 

d.  Ratee' s  status  in  group 
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Table  1.  (Concluded) 


8.  Non-job  variables 

a.  Marita!  status 

b.  Pre-school  children  at  home 

c.  Dual  career  family 

d.  Participation  In  company  activities  off  the  Job 

e.  Stressful  life  events  In  recent  past 

9.  Performance  constraints 

a.  Poor  Information 

b.  Equipment  deficiency 

c.  Supplies  deficiency 

d.  Time  limitations 

e.  Poor  work  environment 

10.  Organizational /uni t  norms 

a.  Expectation  of  certain  level  of  performance  by  upper  management 

b.  Expectation  by  Immediate  supervisor  regarding  level  of  performance 

c.  Presence  of  a  union 

d.  Pay /rewards  tied  to  performance  levels  by  contract 

e.  Pay /rewards  tied  to  performance  levels  by  Informal  norms 

11.  Public  relatlons/actninistrative  procedures 

a.  Required  or  not 

b.  Mode  of  presentation 

c.  Content  of  procedure 

12.  Rater  training 

a.  Content  of  training 

b.  Format  of  training 

c.  Length  of  training 

13.  Measurement  purpose 

a.  Validation  research  only 

b.  Employee  growth  and  development 

c.  Administrative  purposes  such  as  rewards 

d.  To  meet  legal  guidelines 

14.  Performance  feedback 

a.  Required  or  not 

b.  Sources  of  feedback 

c.  Participative 

d.  Clarity  of  feedback 

e.  Frequency  of  feedback 

15.  Pay-performance  relationship 

a.  Are  they  related  In  the  system? 

b.  Equity  of  the  relationship 


III.  VALIDATION  RESEARCH  CLASSIFICATION  SCHEME 


The  performance  measurement  classification  scheme  for  validation  research  is  depicted  in 
Figure  2.  The  dependent  variable,  measurement  quality,  is  something  that  has  not  been  clearly 
defined  In  the  literature.  Different  researchers  (e.g.,  Borman,  1975,  1979;  Kavanagh.  MacKinney, 
A  Hollns,  1971;  Latham,  Hexley.  A  Pursell,  1975)  have  used  different  criteria  to  assess 
measurement  quality,  and  some  of  these  criteria  are  listed  in  Table  2.  Accuracy  and  construct 
validity  are  the  crucial  criteria  to  judge  the  quality  of  the  measurement  of  job  performance,  and 
the  literature  that  has  relied  only  on  the  other  criteria  cannot  be  used  to  reach  definitive 
scientific  conclusions.  The  first  four  criteria  are  seen  as  Important,  but  less  critical,  in 
that  satisfying  the  requirements  of  the  first  four  criteria  does  not  guarantee  the  measure  will 
meet  accuracy /construct  validity  requirements.  However,  an  accurate  and  construct  valid  measure 
will  In  all  likelihood  meet  a  number  of  these  other  criteria  as  well.  In  other  words.  If  It  can 
be  demonstrated  that  a  given  performance  measure  Is  accurate,  the  evidence  for  the  other  quality 
Indices  will  likely  be  positive.  This  logic  is  consistent  with  current  theory  in  measurement 
(Nunnally,  1978)  and  performance  ratings  (Wherry  A  Bartlett,  1982). 

Another  aspect  of  Figure  2  that  Is  critically  Important  Is  the  measurement  method  Input 
variable.  Like  the  variable  of  measurement  purpose  In  Figure  1,  constraining  the  orientation 
presented  in  Figure  2  by  measurement  method  should  affect  the  relationships  within  the  model.  If 
this  Is  so.  It  may  Indicate  that  different  measurement  methods  are  capturing  different  pieces  of 
the  performance  criterion  space.  That  is,  as  noted  by  Borman  (1974)  and  Schneler  (1977), 
supervisory  ratings  may  well  be  assessing  a  different  portion  of  the  total  job  performance 
criterion  space  than  are  peer  ratings,  self-ratings,  work  sample  tests,  or  objective  Indices  of 
productivity.  This  Is  not  meant  to  Imply  that  there  Is  no  overlap  among  these  methods  In  the 
part  of  the  criterion  space  they  are  measuring:  however,  they  are  all  measuring  some  unique 
aspects  of  the  criterion  space  that  have  been  treated  frequently  as  error  in  research. 

In  the  typical  research  to  validate  multiple  measures  of  job  performance,  oric  or  more  methods 
have  been  eliminated  because  of  low  Intercorrelatlons  with  the  other  methods.  These  low 
Intercorrelations  between  different  methods  have  been  assumed  to  be  the  result  of  error  In  the 
measures.  In  a  conceptualization  of  different  measurement  methods  measuring  different  parts  of 
the  criterion  space  with  differing  degrees  of  fidelity,  low  correlations  between  measures  may  not 
Indicate  error.  Thus,  It  can  be  argued  that  the  typical  validation  approach  may  not  be  the  best 
for  assessing  measurement  quality. 

These  Issues  lead  directly  to  the  question  of  the  content  of  the  criterion  space  —  an  Issue 
that  must  first  be  addressed  prior  to  explaining  the  Importance  of  different  measuring  methods. 
The  content  Issue  addresses  the  question:  "What  Is  It,  In  terms  of  performance  dimensions,  that 
constitutes  job  performance?"  In  examining  the  literature,  and  from  personal  experience  with 
developing  performance  measurement  systems,  approximately  12  to  15  performance  dimensions  have 
been  Identified  that  appear  consistently.  Furthermore,  these  dimensions  seem  to  fit  Into  two 
general  categories  or  areas  that  define  job  performance:  technical  competence  skills  and 
job-relevant  Interpersonal  skills.  Although  this  may  seem  to  be  oversimplifying  the  criterion. 
It  Is  supported  by  factor  analytic  studies  In  which  two  factors,  roughly  representing  these  two 
broad  skill  areas,  have  emerged  (Borman,  1981;  Borman,  Mendel,  Lammlein,  A  Rossc,  1981). 
Simplifying  the  criterion  space  In  this  manner  allows  scientists  to  better  specify  the  needed 
research  in  the  AFHRL  criterion  development  project. 
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Table  2.  Quail ^  of  Job  Perfonunce  Neasureaent  Criteria 


1.  Psychometric  errors:  Halo,  Leniency,  Range  Restriction 

2.  Inter-rater  reliability 

3.  Content  validity 

4.  Olscrlmlnablllty:  Separates  individuals  in  terms  of  performance  levels 

5.  Construct  val idity 

6.  Accuracy 


To  return  to  the  Idea  of  multiple  methods  research,  the  following  methods  of  measuring  Job 
performance  are  the  most  frequently  used:  (a)  supervisory  ratings,  (b)  peer  ratings,  (c) 
self-ratings,  (d)  work  samples,  and  (e)  objective  indices  of  productivity.  The  first  three  are 
widely  used  and  will  be  applied  to  the  present  effort.  However,  rather  than  using  a  traditional 
work  sample  methodology,  an  alternative  to  this  approach  will  be  developed  and  tested.  The  new 
methodology  Is  called  Walk-Through  Performance  Testing  (WTPT)  and  Is  being  developed  specifically 
for  this  project  at  the  AFHRL.  The  WTPT  methodology  combines  aspects  of  both  observer 
interviewing  and  work  sampling,  but  In  addition.  Is  designed  to  overcome  certain  limitations 
associated  with  the  generic  tasks  used  with  work  sampling.  The  method  will  be  developed  by 
accessing  the  Air  Force  database  (see  Chrlstal,  1974)  that  contains  Information  on  the  tasks 
performed  In  enlisted  specialties.  These  tasks  will  form  the  basic  content  of  the  measurement 
scale.  WTPT  administrators  will  be  trained  to  use  these  scales  In  terms  of  Incumbent  effective 
and  Ineffective  performance  on  each  of  the  tasks.  The  administrators  will  then  examine  the  job 
Incumbent  by  asking  the  person  to  perform  certain  tasks  or  to  explain  task  procedures  for  that 
job.  The  administrators  will  record  the  person's  behavior  or  answers  on  a  rating  checklist  of 
tasks.  The  lB?)ortant  characteristic  of  this  method  Is  that  the  job  is  being  reduced  to  Its 
smallest  parts  at  the  task  level  and  will  Include  not  only  a  core  set  of  tasks,  but  a  series  of 
unique  tasks  as  well.  Thus,  this  method  will  constitute  an  examination  of  job  performance  at  the 
most  "micro"  level. 

It  Is  believed  that  the  WTPT  method  will  assess,  with  a  high  degree  of  fidelity,  technical 
skills  and  competence  --  one-half  of  the  criterion  space.  In  fact,  walk-through  testing  may  be 
one  method  that  will  allow  removal  of  the  Interpersonal/social  aspects  of  the  job  situation. 
However,  as  currently  planned.  It  may  be  less  accurate  In  assessing  the  job-relevant 
interpersonal  skills  side  of  the  criterion  space.  On  the  other  hand,  supervisory  ratings  may  be 
quite  good  at  assessing  interpersonal  skill,  but  not  very  accurate  In  measuring  technical  skills, 
particularly  if  the  job  Is  one  that  has  had  significant  changes  in  technology  or  Is  one  In  which 
the  supervisor  has  never  had  direct  work  experience.  The  same  logic  can  be  applied  for  the  other 
three  methods.  They  are  all  measuring  portions  of  the  criterion  space;  however,  they  differ  In 
their  fidelity  or  accuracy  of  measurement  across  the  different  parts. 

This  does  not  mean  that  one  or  more  of  the  methods  could  not  be  modified  to  assess  both  major 
parts  of  the  job  performance  criterion  space.  The  WTPT  method  could  be  modified,  for  example,  to 
measure  Interpersonal  skills.  However,  this  modification  may  not  be  cost-effective  if  another 
method  is  available  that  can  accurately  assess  the  interpersonal  skills  without  modification 


(e.g.,  peer  ratings).  The  central  point  of  this  argument  Is  that  the  five  methods,  as  they  are 
currently  used,  assess  different  parts  of  the  criterion  space  with  differing  degrees  of 
accuracy.  Any  research  In  this  program  of  criterion  development  must  recognize  this  fact,  and 
research  must  be  designed  with  this  point  In  mind. 

This  type  of  logic  makes  the  typical  multimethod  validation  study  problematic.  If  Job 
performance  Is  measured  with  two  or  more  methods,  and  zero-order  correlations  are  calculated 
among  the  methods,  the  conclusions  regarding  the  validity  of  the  methods  are  based  In  part  on 
significant  non-zero  correlations  between  methods  for  common  traits.  Methods  showing 
non-significant  values  are  rejected.  However,  If,  as  has  been  argued,  the  methods  are  not 
assessing  the  same  portions  of  the  criterion  space  with  equal  fidelity,  then  there  Is  no  reason 
to  expect  them  to  be  correlated. 

The  extension  of  the  logic  that  the  different  methods  measure  different  portions  of  the 
criterion  space  with  varying  degrees  of  fidelity  leads  directly  to  the  Idea  of  specifying  the 
construct  space  for  job  performance  In  terms  of  what  Cronbach  and  Heehl  (1955)  have  termed  a 
nomological  network  (a  network  of  relations  that  are  tied  to  observables  and,  hence,  are 
empirically  testable).  In  this  framework,  the  measures  are  the  observables,  and  the  construct  Is 
used  to  account  for  relationships  among  them.  This  suggests  the  use  of  one  of  the  techniques  for 
construct  validation  discussed  by  Nunnally  (1978),  namely,  that  of  testing  the  a  priori 
hypothesized  relationships  within  a  construct  space  with  empirical  data.  In  order  to  do  this, 
the  two  major  parts  of  the  criterion  space,  technical  competence  and  Interpersonal  skills,  must 
be  better  specified  In  terms  of  the  job  performance  dimensions  that  comprise  these  general 
categories. 

With  the  delineation  of  a  multidimenslon-multimethod  matrix,  the  next  step  In  this  research 
strategy  would  be  to  hypothesize  the  expected  level  of  relationship  between  each  method-dimension 
and  all  others.  Some  might  be  high,  others  moderate,  and  some  zero.  In  this  manner,  one  would 
have  specified,  a  priori,  the  hypothesized  nomological  net  for  these  methods  and  the  criterion 
space.  After  collecting  data,  the  results  would  be  examined  to  verify  the  expected 
correlations.  In  this  strategy,  a  zero  relationship  would  be  as  Important  as  a  non-zero  one  In 
establishing  the  construct  validity  of  the  methods  of  measurement. 

For  the  Air  Force  research  program,  this  empirical  construct  validation  project  cannot  be 
done  until  at  least  the  fifth  or  sixth  year.  First,  the  methods  must  be  properly  researched  and 
refined  so  that  this  approach  to  construct  validation  Is  not  subject  to  criticism  for  poorly 
designed  measures.  One  Implication  of  this  line  of  thinking  for  the  Air  Force  program  Is  that 
there  Is  a  tremendous  amount  of  research  to  be  done  during  the  first  4  or  5  years  of  the  program 
before  this  type  of  study  can  be  conducted. 


IV.  SUCCESSIVE  APPROXIMATION:  A  RESEARCH  STRATEGY 

Another  research  strategy  that  will  underlie  the  Air  Force  program  Is  the  notion  of 
successive  approximations  to  high  fidelity  measures.  It  Is  expected  that,  as  currently 
envisioned,  WTPT  will  accurately  assess  technical  Job  skills  of  Individuals.  This  method  Is 
quite  time-consuming  and  expensive,  particularly  If  It  Is  going  to  be  used  for  large-scale  data 
collection  across  the  Air  Force.  If,  after  the  appropriate  research.  It  Is  believed  that  WTPT 
does  have  high  fidelity  for  technical  skills,  thorough  research  can  be  attempted  to  determine 
which  one  of  the  less  expensive  job  performance  data  collection  methods  most  closely  approximates 
WTPT.  It  may  be,  for  example,  that  some  combination  of  peer  and  self-ratings  are  a  close  enough 
approximation  to  WTPT,  for  the  technical  Job  skills,  to  allow  the  walk-through  testing  method  to 
be  dropped  In  favor  of  these  less  expensive  ones.  Also,  as  earlier  noted,  WTPT  may  be  modified 
to  measure  Interpersonal  skills.  If  It  can  be  modified  to  measure  accurately  these  different 
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parts  of  the  job  performance  criterion  space,  the  same  research  strategy  of  successive 
approximation  with  less  costly  and  time-consuming  methods  can  be  done. 

V.  TIME-TO-PROFICIENCY 

An  additional  variable,  time-to-proficlency  on  job  tasks,  may  need  to  be  Incorporated  Into 
the  research  program.  It  appears  that  It  might  be  embedded  within  all  of  the  methods  of 

measurement.  Research  will  be  necessary  to  determine  how  best  to  adapt  the  five  methods  to 

measure  this  crucial  part  of  job  performance.  Hide  individual  differences  on  this  variable  would 
be  expected  In  newly  assigned  personnel,  particularly  In  their  first  job  In  the  military. 
Furthermore,  It  may  be  that  some  of  the  methods  are  able  to  measure  this  variable  across  task 
performance  with  varying  degrees  of  fidelity /accuracy.  In  fact,  there  probably  Is  a 

task/dimension  by  method  effect  In  terms  of  accuracy  of  assessing  tiroe-to-proficiency.  The 
Important  point  Is  that  this  variable  must  be  Included  In  aoy  research  project.  As  a  footnote, 
collecting  this  type  of  performance  data  may  be  most  appropriate  for  the  purpose  of  validating 
the  learning/acquisition  rate  measures  that  are  being  collected  In  the  Learning  Abilities 
Research  Function  at  AFHRL. 

In  terms  of  validating  the  ASVAB,  this  type  of  logic  has  certain  Implications.  Validating 
the  ASVAB  against  training  school  criteria  has  not  been  totally  effective  because  training  school 
success  reflects  only  a  part  of  the  total  job  performance  criterion  space,  namely.  Initial 

acquisition  of  technical  competence  and  skills  In  a  school  environment.  Since  the  ASVAB  has  been 
designed  to  measure  and  predict  rate  of  learning  on  job  skills,  It  seems  reasonable  that  It 
should  predict  training  school  success.  However,  job  performance  In  the  military  also  Includes 
job-relevant  Interpersonal  skills,  and  thus,  using  the  measurement  of  total  job  performance  to 
validate  the  ASVAB  may  be  less  than  Ideal.  In  fact,  measures  of  technical  competence  on  the 
first  military  job,  by  whatever  method  Is  found  to  have  the  greatest  accuracy,  would  probably  be 
the  most  appropriate  criterion  on  which  to  validate  the  ASVAB.  An  extension  of  this  logic  would 
Indicate  that  other  predictors  are  needed  (e.g.,  vocational  Interests)  for  selecting  persons  who 
will  succeed  In  terms  of  the  job-relevant  Interpersonal  skills. 


VI.  CONCLUSIOMS  AND  RECOMMENDATIONS 

The  classification  scheme  pictured  In  Figure  2  will  be  used  to  summarize  and  organize  the 
previous  research  as  well  as  to  specify  needed  future  research.  If  a  line  Is  drawn  between  any 
of  the  variables  (either  Individual /system  characteristics  or  the  process  variables)  and  the 
dependent  variable,  a  linkage  In  the  system  is  defined.  These  linkages  will  be  used  to  organize 
the  literature  search  as  well  as  to  make  recoomendatlons  for  needed  research  In  the  Air  Force 
program.  The  literature  will  be  categorized  In  the  appropriate  linkage  In  the  model,  and  we 
should  be  able  to  draw  some  conclusions  regarding  the  known  empirical  "facts"  within  each 
linkage.  Additional  research  needs  for  effective  criterion  development  can  then  be  Identified. 
For  example,  in  the  linkage  Involving  rater  training.  It  Is  known  that  some  rater  training  Is 
necessary  to  ensure  the  accuracy  and  general  quality  of  the  ratings  for  this  method  (see  Borman, 
1980;  Landy  i  Farr,  1980).  This  training  could  be  quite  minimal,  e.g.,  an  explanation  of  how  to 
complete  the  form.  It  Is  clear  from  the  literature  that  the  absence  of  rater  training  has 
negative  Impact  on  the  quality  of  the  measurement.  Note  that  this  argument  also  applies  to 
observer  training  In  the  walk-through  testing  method.  It  Is  not  known  what  specific  type  of 
rater/observer  training  Is  best  in  terms  of  obtaining  accurate  measures. 

Implementation  of  the  procedures  described  In  the  previous  paragraph  will  result  In  a 
technical  report  containing  the  following  sections:  (a)  a  background  and  description  of  the 
problem  of  criterion  development,  (b)  a  description  of  the  approach  to  model  development  (mostly 


detailed  In  this  paper),  (c)  a  review  of  the  major  and  relevant  literature  organized  by  the 
linkages  In  the  classification  scheme,  (d)  a  detailed  specification  of  the  "known"  facts  and 
needed  research  for  successful  criterion  development,  and  (e)  a  discussion  of  the  research 
priorities,  both  in  terms  of  Importance  and  time  sequencing,  for  the  Air  Force  program  of 
research. 
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